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1 . 0  Introduction 

Suppose  we  wish  to  estimate  P(X  •  c)  where  X  is  a  continuous  random 
variable  and  c  is  some  constant.  That  is  we  wish  to  estimate  F(c)  where  K 
is  the  cdf  underlying  X.  Since  F  is  unknown,  a  model  cdf  F(x;0)  is  selected, 
and  the  vector  of  parameters  0  =  (0  ,0  ,...,0  )  may  be  estimated  from  a  random 

p 

sample.  In  this  situation  some  care  should  be  taken  with  respect  to  the 
method  of  estimation  of  G.  Since  we  do  not  know  the  form  of  F  and  we  have 
only  postulated  the  form  is  G(x;0),  we  should  use  a  method  of  estimation  of  0 
which  is  robust.  That  Is,  the  method  by  which  0  is  estimated  should  result 
in  a  value  of  G(x;&)  which  is  as  close  as  possible  to  F(x)  even  when  C(x;9) 
is  different  from  F(x). 

Also,  as  any  applied  statistician  knows,  data  samples  of ten  contain 
some  contamination  or  outliers  which  are  sometimes  difficult  to  detect.  The 
estimation  technique  used  should  be  robust  in  the  sense  that  it  siioud  not  be 
too  sensitive  to  the  contamination. 


2. 0  Methodology 

Samples  of  size  N  were  generated  from  various  underlying  distributions  in 
the  following  manner.  First,  k  random  samples  of  size  N  were  generated  from 
the  unit  interval  with  y  being  the  i*"*1  individual  in  the  j^’  sample.  Using 

xii  =  F  k  random  samples  x..,  x- . , . . .  x.,.  from  F(x)  were  obtained, 

tj  ij  1)2]  N | 

We  considered  four  different  families  of  rases  as  follows: 

(l)  F(x)  is  a  cdf  such  as  logistic,  Weibull,  Laplace,  etc. 

(11)  F(x)  is  a  mixture  of  two  different  normal  cdf's 
(lii)  F(x)  is  a  normal  cdf  with  contamination 
(iv)  F(x)  is  a  Weibul!  distribution  with  contamination  and  without 


contamination . 
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We  used  the  normal  model  for  G(x;@)  in  (i),  (Ii),  (iii),  and  in  case  (iv) 
a  Weibull  model  for  G(x;0)  was  used.  In  all  of  the  above  cases  we  estimated 
the  parameters  in  the  model  cdf  using  the  maximum  likelihood  method  and  a 
least  squares  technique. 

The  least  squares  estimates  were  obtained  by  regressing  the  model  cdf 
G(x;0)  on  the  empirical  cdf  which  in  our  case  requires  non-linear  regression. 
The  empirical  cdf^  may  be  defined  as  below 


(x) 


for  x 


(i) 


(1+1) 


=  0  for  x  <  . 

where  x^  is  the  i*"^1  order  statistic.  Now,  the  vector  of  parameters  0  is 
estimated  by  selecting  those  values  such  that 


is  minimized. 

Since  the  above  minimization  does  not  usually  yield  a  linear  system  of  normal 
equations,  the  parameters  must  be  estimated  using  non-linear  techniques.  We 
used  the  linearization  or  Taylor  series  method  which  is  described  in  Draper 
and  Smith  (1966).  There  are  a  number  of  other  possibly  more  efficient  methods. 
However  the  linearization  method  gave  us  good  results  with  respect  to  computer 
time,  and  it  was  easily  programmed  in  SAS  MATRIX.  Most  all  non-linear  techniques 


L  Often  i/N  is  used  in  place  of  (21— 1 ) /N  in  the  definition  of  F„,(x).  A  more 

N 

general  form  is  (i-c ) / (n-2c+l) ,  where  the  value  of  c  depends  on  the  distrib¬ 
ution.  The  values  of  0  and  3/8  are  then  used  for  the  uniform  and  normal 
distributions,  respectively.  For  further  details  see  Hahn  and  Shapiro  (1967). 
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require  initial  values  to  estimate  the  parameters.  One  method  used  when 


0  =  (Q,,0O)  was  as  follows:  ' 

L  Z  /  1 0, 

/A"*.  S?,- 

th  ►,„*  4S  V 

(i)  Let  x,..  be  the  i  order  statistic  such  that  i/N  is  between 

HI  /  < 

/  »  ^ 

IQ A  OQ  1  1  ..  1 J  „  U  1. J  **  Q  <■ .  W 


.15  and  .25,  and  let  x...  be  the  j  order  statistic  such  tha 

/ 

j/N  is  between  .75  and  .85.  That  is,  select  two  order  / 


statistics,  one  in  the  lower  and  one  in  the  upper  tail.  /'  t- 


(ii)  let  G(x(i);01,G2)  =  i/N 


G(x(  );01,02)  =  j/N 


***/,  V  ^ 
•yr  ■<*;,  */, 

Q  '  ^ 

7/  ■  •  °  - 


and  solve  the  system  for  0^  and  ©2  •  v 
This  works  in  some  cases  such  as  the  case  when  G(x;0)  is  the  Weibull 
distribution. 


\/ 


The  maximum  likelihood  estimate  of  0  is  the  value  of  0  which  maximizes  the 
likelihood  function  L.  Tf  the  probability  density  function  for  the  distribution 
is  given  by  g(x,0),  and  x^,x2>...,Xj|J  are  N  values  chosen  randomly  from  the 
distribution,  then  L  is  given  by 

L  =  g(x1,0)g(x2,0) . . .g(xN,0)  . 

The  maximum  likelihood  estimate  of  0  is  usually  obtained  by  differentiating 
L  with  respect  to  0,  setting  the  resulting  derivative  equal  to  zero  and  solving 
for  the  value  of  0.  For  most  distributions,  the  solution  is  not  straight  forward 
and  is  best  obtained  by  the  use  of  an  iterative  technique. 


3. 0  Comparison  of  Maximum  Likelihood  and  Least  Squares 

Local  and  global  errors  were  calculated  as  a  means  to  compare  the  two 


estimation  techniques  with  respect  to  the  underlying  cdf.  Let  be  the 


-6- 


value  such  Chat  F(C  )  =  P.  The  local  error  for  a  specified  value  of  P  lot 
P 

the  j1"*1  sample  is  given  by 

d  .  =  G(C  ;0.)  -  P. 

PJ  P  J 

We  define  errors  F,^  (local),  E^  and  (global)  as  follows: 
k 

E1 (p)  =  [  jd  .  |/k  (The  average  local  error  at  P  over  k  samples) 
i  i  '  i  P  J 


E,,  =  J  (Max|d  .  |  )/k 

2  j.i  P  '  w 


E3  -  I  l  Mpjl 

j=l  all  P  PJ 

where  L  is  the  number  of  values  of  p  used. 

For  E^  and  E^  the  max  and  sum  were  taken  over  the  grid  {.01,  .05,  .1,  .2, 

.3,  .4,  .5,  .6,  .7,  .8,  .9,  .95,  .99},  and  L,  the  cardinality  of  this  set  is 
13.  In  all  of  the  results  reported  in  this  paper  k  =  75.  That  is,  the  results 

are  based  on  75  random  samples  of  size  N.  N  =  25  in  all  of  the  reported 
results.  however,  wc  did  use  samples  of  size  10  and  75,  and  observed  the  same 
general  pattern  as  that  for  samples  of  size  25. 


3.1  Normal  Model  ised  on  Various  Distributions 

Table  3.1  compares  the  method  of  maximum  likelihood  and  least  squares  when 
the  data  is  sampled  (generated)  from  each  of  several  different  distributions 
and  the  model  used  is  the  normal  distribution.  When  the  sample  comes  from  the 
normal,  Laplace,  rectangular,  symmetric  triangular  and  the  Cauchy  distribution:- 


the  results  are  independent  of  the  values  of  the  parameters  of  the  distribution 


This  is  not  the  case  for  the  Weibull  and  gamma.  The  forms  we  used  are  given  by 

[J 

F(x;a,B)  =  1  -  e  aX  (Weibull) 

F(x;ri,A)  =  (X n  /  r(n))  /X  t°  *  e  *  t  dt  (gamma). 

0 

It  is  to  be  expected  that  for  a  normal  distribution  and  a  normal  model, 
the  MLE  would  be  superior.  However  the  difference  is  small.  For  the  logistic 
distribution  the  two  methods  differ  little.  For  the  rectangular  and  triangular 
distributions,  the  MLE  is  superior,  while  for  the  Laplace  and  the  Cauchy,  the 
LSE  is  superior.  The  superiority  of  the  LSE  is  substantial  in  the  case  of  the 
Cauchy  distribution. 

The  Weibull  is  well  approximated  by  the  normal  when  3  is  between  2  and  6, 
as  is  the  gamma  distribution  for  large  values  of  n  (the  skewness  and  kurtosis 
of  the  gamma  distribution  are  given  by  2// n  and  3  +  6/n  respectively).  Because 
of  this  it  is  not  surprising  that  the  MLE  out-performs  the  LSE  for  the  Weibull 
(1,4)  and  gamma  (9,2)  and  the  LSE  is  superior  for  Weibull  (1,1),  gamma  (3,3) 
and  gamma  (2,1). 

The  difference  between  the  two  methods  when  the  Cauchy  distribution  is  the 
true  underlying  distribution  is  illustrated  in  Figure  3.1.  Figure  3.1  shows 
the  true  cdf,  (the  Cauchy  distribution),  the  empirical  cdf,  and  the  two  normal 
model  cdf's  using  the  MLE  and  LSE  for  y  and  o  for  a  typical  sample.  This  graph 
shows  the  normal/LSE  model  giving  a  much  closer  approximation  to  the  Cauchy 
than  the  normal/MLE  model. 

3.2  Normal  Model  Used  on  the  Mixture  of  Two  Normals 

Table  3.2  gives  the  results  for  the  MLE  and  LSE  when  the  data  is  from  a 
mixture  of  two  different  normal  distributions.  We  use  (1-v)  and  v  to  denote 
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the  respective  weights  of  N(0,l)  and  N(a,b)  distributions. 

When  the  means  and  variances  of  the  two  normal  distributions  differ  but 
little  (that  is  when  a  is  not  far  from  zero  and  b  is  not  far  from  !),  there 
is  little  difference  between  MLE  and  LSE.  The  LSF.  offers  substantial 
improvement  when  the  two  mixing  normals  differ  more.  Figure  3.2  illustrates 
this  with  the  graph  of  the  mixture  of  two  normals  (.8  N(0,1)  and  .2  N(3,9)), 
the  empirical  cdf,  the  normal/MLE,  and  the  normal/LSE. 

3 . 3  Normal  Model  Used  on  a  Normal  Distribution  with  Contamination 

Tn  this  case,  the  true  underlying  distribution  is  normal.  However,  some 
of  the  data  has  been  contaminated  or  altered  in  some  way.  This  frequently 
happens  in  actual  case  studies,  and  sometimes  it  is  difficult  to  recognize  the 
altered  information.  This  is  different  from  the  previous  case  in  that  the 
true  underlying  distribution  discussed  in  Section  3.2  is  an  actual  mixing  of 
two  distinct  normals,  and  in  this  case  the  true  underlying  distribution  is  a 
single  normal  distribution. 

Table  3.3  gives  the  results  for  the  two  methods.  The  underlying  distribution 
is  N(0,1)  with  a  proportion  v  of  the  data  altered.  The  altered  observations 
were  assumed  to  be  N(a,b).  In  practice  the  contamination  may  take  other  forms, 
but  the  given  alteration  serves  to  illustrate  the  effects  of  contamination  on 
the  estimation  methods.  It  seems  clear  that  the  method  of  least  squares  gives 
much  better  results  than  maximum  likelihood  even  when  only  modest  contamination 
is  present. 

Figure  3.3  illustrates  the  difference  between  the  two  methods  when  a 
N (p , 1 )  distribution  is  contaminated  (4%)  with  a  N(0,9). 


It  shojld  be  noted  that  the  LS  model  appears  to  he  very  stable  under 


contamination  while  the  ML  model  seems  quite  sensitive  to  contamination. 

3 . 4  Weibull  Model  Used  on  Weibull  Data  With  and  Without  Contamination 
The  samples  were  generated  from  a  Weibull  (2,6)  population  with  a 
proportion  v  of  the  data  being  "contaminated."  The  contamination  was  effected 
by  the  transformation  /b  x  +  a.  Table  3.4  compares  the  MLE  and  LSE  for  various 
proportions  of  contamination  and  combinations  of  a  and  b.  When  v  =  0,  i.e., 
there  is  no  contamination,  the  MLE  is  slightly  superior.  For  the  three  cases 
where  there  is  contamination,  the  LSE  shows  a  definite  superiority. 


I 
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E1 

(Local  Errors' 

P 

K2 

K3 

F(x) 

.05 

.  5 

.95 

l 

Normal 

ML 

.023 

.070 

.028 

.0861 

.0452 

LS 

.02  7 

.071 

.031 

.0926 

.0479 

Laplace 

ML 

mm 

mgm 

.030 

.  129= 

.0574 

LS 

mm 

.032 

.1148 

.0551 

Rectangular 

ML 

.024 

.026 

.1169 

.0529 

LS 

.041 

.046 

.1235- 

.0602 

Triangular 

ML 

.024 

.077 

■flip 

■ 

.0465+ 

(Symmetric) 

LS 

.030 

.081 

mm 

.0519 

Legist Lc 

ML 

.029 

.078 

.030 

.1024 

.0511 

LS 

.029 

.080 

.028 

.  1031 

.0510 

Cauchy 

MI. 

.139 

.077 

.166 

.2896 

.1330 

LS 

.049 

.077 

.048 

.1279 

.0606 

Wei  bull  (1,4) 

ML 

.025 

.0799 

.0418 

LS 

.028 

.0913 

.0476 

Weibull  (1,1) 

ML 

.  104 

.126 

.036 

.1742 

Kb 

LS 

IBSI 

.093 

.044 

.1618 

■9 

Gamma  (9,2) 

ML 

.030 

.071 

.027 

.0963 

■SB 

LS 

.034 

.073 

.031 

.1011 

warn 

Gamma  (3,3) 

ML 

■31 

.087 

■SB 

.1184 

R  | 

LS 

■9 

.078 

in 

.1160 

WSM 

Gamma  (2,1) 

ML 

mm 

.099 

.033 

.1337 

■SB 

LS 

■i 

.084 

.041 

.  1287 

n 

TABLE  3.1 

A  Comparison  of  Maximum  Likelihood  and  Least  Squares 
Using  the  Normal  Model 
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V 

a 

b 

(Local  Error) 

I> 

h2 

K3 

.05 

.5 

.95 

.08 

3 

H 

ML 

■9 

■g 

■B 

— 

.04  58 

■ 

LS 

HH 

mm 

H 

mBM 

.  04  7  9 

.20 

3 

n 

ML 

mm 

.080 

mm 

.1048 

cn 

■ 

LS 

.062 

.1011 

Hi 

.52 

3 

■ 

ML 

n 

m 

mm 

.0386 

■ 

LS 

■Ql 

BB 

EB 

EBfl 

.0414 

00 

o 

3 

9 

ML 

■9 

mm 

.042 

.1217 

.0601 

LS 

■9 

m 

.032 

.0980 

.0489 

N> 

O 

■ 

9 

ML 

mm 

B9 

.031 

— 

— 

I 

■ 

LS 

HH 

KB 

.047 

BIB 

.52 

3 

9 

ML 

— 

.110 

.046 

!■ 

LS 

mm 

.072 

.033 

H 

.52 

0 

9 

Ml. 

n 

Mi 

.057 

.1132 

.0608 

LS 

mm 

KB 

.037 

.0956 

.0498 

.  52 

0 

1/9 

ML 

— 

.071 

.033 

.1260 

.0536 

LS 

■ 

.072 

.044 

.1044 

.0516 

TABLE  3.2 


A  Comparison  Between  Maximum  Likelihood  and  Least  Squares 
Using  the  Normal  Model  on  Data  from  the  Mixture  of  Two  Normals 


A  Comparison  of  Maximum  Likelihood  and  Leas!.  Squares 
Using  the  Mornal  Model  on  Contaminated  Normal  Data 


4 . 0  An  Example  of  Fitting  a  Welbull  Model  to  Visibility  Data  Usin £_Max i mum 

Likelihood  and  Least  Squares  Techniques 

Table  4.1  gives  the  empirical  cdf,  the  Weibull/LSE  fit,  and  the  Wt-ibulJ  'VI! 
fit  for  visibility  data  at  10  a.m.  in  February  at  Mildenhall,  England.  Wc 
were  concerned  with  the  estimation  of  P(X'x)  where  x  is  any  positive  real 
number,  and  X  is  visibility  in  miles.  The  data  was  the  result  of  approximately 
ten  years  of  observations.  The  object  was  to  produce  a  simple  formula  from 
which  the  probability  of  visibility  events  could  be  quickly  evaluated.  Such 
a  formula  would  "compact"  the  data,  and  be  useful  for  simulation  models. 

The  Weibull  model  had  previously  been  used  for  a  number  of  other  locations 
for  various  times  of  day  and  year.  For  the  data  in  the  table,  the  MLE  and  LSF. 
give  very  similar  results.  Because  of  its  robust  properties,  we  have  preferred 
the  results  from  the  LSE. 


X  MILES 

0 

k 

5/16 

4  | 

n 

□ 

a 

2 

2H 

3 

■ 

5  u 

.557  j . 6  L 

1 

OBSERVED 

FREQUENCY 

.000 

.031 

.034 

| 

.047  1 

} 

.065 

.081 

.113 

.152 

.180 

■ 

.343 

g 

.453 

LSE  FIT 

.075 

.091 

.124 

.156 

.  188 
_ | 

a 

69 

.467 

.555  j .  6 

MLL  ITT 

j.  015 

.123 

1 

.  154  j 

.186  | 

.24  7 

|.  305 

.359 

.459 

.545  j.nl  < 

TABLE  4.1:  The  Empirical  C.D.F.,  Weibull/LSE  Fit,  and  the  Weibull/MLE  Fit  for 
Visibility  at  Mildenhall,  England,  February,  10  a.m. 
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5 . 0  Summary  and  Conclusions 

The  method  of  maximum  likelihood  and  a  least  squares  technique  have 
been  compared  under  a  variety  of  different  situations  when  the  purpose  of 
the  estimation  was  to  estimate  the  cdf.  When  the  probabilistic  model  used 
was  correct  or  nearly  correct  the  two  methods  produced  very  similar  results 
with  the  MLE  usually  slightly  better.  However,  when  the  model  used  was  wrong 
or  the  data  was  contaminated,  the  least  squares  technique  often  gave 
substantially  better  results. 

Thus,  it  appears  that  the  LSE  are  most  useful  when  the  underlying 
probability  distribution  is  not  clearly  established  or  when  the  sample 
information  has  possible  outliers.  In  these  situations  the  LS  model  exhibits 
a  great  deal  more  stability  or  robustness  than  the  ML  model. 

The  maximum  likelihood  method  is  frequently  the  only  method  used  for 
parameter  estimation.  Our  results  are  in  agreement  with  the  statements  of 
Berkson  (1980)  and  I.eCam  (1980)  which  point  out  that  maximum  likelihood 
procedures  should  not  be  used  exclusively  without  regard  to  the  purposes  for 
which  the  estimates  are  required. 
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-  Normal  Curve  = 


Mixture  of  two  NORMAL  DISTRIBUTIONS 
80%  N(0,1)  and  20  %  N(3,9)  =  Solid  curve 
MLE  Normal  Curve  =  —  ,  LSE  Normal  curve  =  • 
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